Overview and Motivation: Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.

Related Work: Anything that inspired you, such as a paper, a web site, or something we discussed in class.

Initial Questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis?

Data: Source, scraping method, cleanup, etc.

Exploratory Analysis: What visualizations did you use to look at your data in different ways? What are the different statistical methods you considered? Justify the decisions you made, and show any major changes to your ideas. How did you reach these conclusions?

Final Analysis: What did you learn about the data? How did you answer the questions? How can you justify your answers? Note that 1 type of analysis per team member is required. A Shiny app counts as a type of analysis.

Load Packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v stringr 1.4.0
## v tidyr   1.1.2     v forcats 0.5.0
## v readr   1.4.0
## -- Conflicts -------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(pdftools)
## Using poppler version 0.73.0
library(readr)
library(stringr)
library(ggthemes)
library(shiny)
library(shinyBS)
library(RColorBrewer)
library(shinydashboard)
## 
## Attaching package: 'shinydashboard'
## The following object is masked from 'package:graphics':
## 
##     box
library(sp)
library(rgeos)
## rgeos version: 0.5-5, (SVN revision 640)
##  GEOS runtime version: 3.8.0-CAPI-1.13.1 
##  Linking to sp version: 1.4-4 
##  Polygon checking: TRUE
library(rgdal)
## rgdal: version: 1.5-18, (SVN revision 1082)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
## Path to GDAL shared files: C:/Users/zay-z/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
## Path to PROJ shared files: C:/Users/zay-z/Documents/R/win-library/4.0/rgdal/proj
## Linking to sp version:1.4-4
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
library(maptools)
## Checking rgeos availability: TRUE
library(leaflet)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(grid)

MCH Data Wrangling *Low BW Only (for Map) We wanted to create a map to visualize the pattern of low birth weight in California. I initially used CDC WONDER database but it had very limited information, where it grouped rural and small county-level data to “unidentified counties.” Thus, I consulted another data source from California Open Data Portal (https://data.ca.gov/dataset/live-births-with-low-and-very-low-birthweight). Following is the data cleaning process.

#Data Wrangling for Year 2014-2018 Data for Map.  
lbwdata<-read.csv("./low-and-very-low-birthweight-by-county-2014-2018 (1).csv", header = TRUE, stringsAsFactors = FALSE)
lbwdata <- lbwdata %>% mutate(County = str_to_title(County))
lbwdata$Events[is.na(lbwdata$Events)] <- 0
lbwdata <- lbwdata %>% group_by(Year, County, Total.Births) %>% summarize(Events = sum(Events)) 
## `summarise()` regrouping output by 'Year', 'County' (override with `.groups` argument)
lbwdata <- lbwdata %>% filter(!County == "california") 
lbwdata <- lbwdata %>% mutate(Rate = Events/Total.Births)

MCH Data Wrangling *Preterm Birth Only (for map) We also wanted to create another map to visualize the pattern of preterm birth in California. Again, I ran into a similar problem using CDC WONDER database. Thus, I consulted the following database: https://data.chhs.ca.gov/dataset/preterm-and-very-preterm-live-births/resource/cff79e2d-6ecf-4158-9e4f-7078632220ee Following is the cleaning process

ptbirthdata<- read.csv("preterm-and-very-preterm-births-by-county-2010-2018-3.csv", header = TRUE, stringsAsFactors = FALSE)
ptbirthdata$Events[is.na(ptbirthdata$Events)] <- 0
ptbirthdata <- ptbirthdata[,-c(7,8)]
ptbirthdata <- ptbirthdata %>% group_by(Year, County, Total.Births) %>% summarize(Events = sum(Events)) 
## `summarise()` regrouping output by 'Year', 'County' (override with `.groups` argument)
ptbirthdata <- ptbirthdata %>% filter(!County == "california")#removing the total count
ptbirthdata <- ptbirthdata %>% mutate(rate_pt = Events/Total.Births * 100)

Code for Creating the Map - Leaflet Map Using LBW Data (See Shiny App for the Final Result) After cleaning the dataset, I then looked at creating a “spatial” map. Following is the wrangling process of generating a map that shapes the map of California, and merging that spatial object with low birth weight data that I wrangled earlier to generate a leaflet map. My main motivation of using a leaflet map was because I wanted to create a map where the user can see which county is which and is able to zoom in and out. Note that there are counties that had NA cases (perhaps for counties that had a very small population)

map <- readOGR(path.expand("cb_2018_us_county_20m.shp"),
               layer = "cb_2018_us_county_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\zay-z\Documents\Harvard Chan\Fall 2020\BST260\datascience-project\Data Prep (& Final RMD)\cb_2018_us_county_20m.shp", layer: "cb_2018_us_county_20m"
## with 3220 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
Statekey<-read.csv('./STATEFPtoSTATENAME_Key.csv', colClasses=c('character'))
map<-merge(x=map, y=Statekey, by="STATEFP", all=TRUE)
SingleState <- subset(map, map$STATENAME %in% c(
    "California"
))

lbwdata_2016 <- lbwdata %>% filter(Year == "2016") %>% mutate(Rate = Events/Total.Births*100)
spatial_lbw <-sp::merge(x=SingleState, y=lbwdata_2016, by.x="NAME", by.y="County", by=x)

bins <- c(4.0,6.3,7.6,8.1, Inf)
pal <- colorBin(
    palette = "viridis",
    domain = spatial_lbw$Rate, n=7, bins=bins)

leaflet(spatial_lbw, options = leafletOptions(zoomControl = TRUE, zoomLevelFixed = FALSE, dragging=TRUE, minZoom = 5.3, maxZoom = 9)) %>% 
            setView(lat = 36.778259, lng = -119.417931, zoom = 6) %>%
            addPolygons(color = "Black", weight = 1, smoothFactor = 0.5, 
                        opacity = 1.0, fillOpacity = 0.5, layerId = ~NAME,
                        fillColor = ~pal(Rate), 
                        popup = ~as.factor(paste0("<b><font size=\"4\"><center>County: </b>",spatial_lbw$NAME,"</font></center>","<b>% of Low Birth Weight Births: </b>", sprintf("%1.2f%%", spatial_lbw$Rate),"<br/>"))) %>%
            addLegend(pal = pal, values = spatial_lbw$Rate, opacity = 1, title="% Low Birth Weight (Quartiles)")
## Warning in pal(Rate): Some values were outside the color scale and will be
## treated as NA

Code for Creating the Map - Leaflet Map Using Preterm Birth Data I Wrangled Earlier (See Shiny App for the Final Result) This is a similar spatial map but for pre-term birth. Following is the wrangling process of generating a map that shapes the map of California, and merging that spatial object with pre-term birth data that I wrangled earlier to generate a leaflet map. Note that there are counties that had NA cases (perhaps for counties that had a very small population)

map <- readOGR(path.expand("cb_2018_us_county_20m.shp"),
               layer = "cb_2018_us_county_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\zay-z\Documents\Harvard Chan\Fall 2020\BST260\datascience-project\Data Prep (& Final RMD)\cb_2018_us_county_20m.shp", layer: "cb_2018_us_county_20m"
## with 3220 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
Statekey<-read.csv('./STATEFPtoSTATENAME_Key.csv', colClasses=c('character'))
map<-merge(x=map, y=Statekey, by="STATEFP", all=TRUE)
SingleState <- subset(map, map$STATENAME %in% c(
    "California"
))

ptbirthdata_2016 <- ptbirthdata %>% filter(Year == "2016") 
spatial_pt <-sp::merge(x=SingleState, y=ptbirthdata_2016, by.x="NAME", by.y="County", by=x)

bin <- c(5.5, 8.2, 9.1, 9.9, Inf)
pal2 <- colorBin(
    palette = "plasma",
    domain = spatial_pt$rate_pt, n=7, bins=bin)

leaflet(spatial_pt, options = leafletOptions(zoomControl = TRUE, zoomLevelFixed = FALSE, dragging=TRUE, minZoom = 5.3, maxZoom = 9)) %>% 
                    setView(lat = 36.778259, lng = -119.417931, zoom = 6) %>%
                    addPolygons(color = "Black", weight = 1, smoothFactor = 0.5, 
                                opacity = 1.0, fillOpacity = 0.5, layerId = ~NAME,
                                fillColor = ~pal2(rate_pt), 
                                popup = ~as.factor(paste0("<b><font size=\"4\"><center>County: </b>",spatial_pt$NAME,"</font></center>","<b>% of Preterm Birth: </b>", sprintf("%1.2f%%", spatial_pt$rate_pt),"<br/>"))) %>% addLegend(pal = pal2, values = spatial_pt$rate_pt, opacity = 1, title="% Preterm Birth (Quartiles)")
## Warning in pal2(rate_pt): Some values were outside the color scale and will be
## treated as NA

MCH Data Wrangling *New CDC Data (Lara)

#Data Wrangling for CDC Data (COMPLETE)
MCH.CDC.Data <- read.delim("NatalityTOTAL.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data <- MCH.CDC.Data[-c(491:585), ]
MCH.CDC.Data <- MCH.CDC.Data %>% filter(Notes != "Total")
MCH.CDC.Data <- MCH.CDC.Data[ ,-c(1,3,5,7,9)]

MCH.CDC.Data_Race <- read.delim("NatalityRACE.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data_Race <- MCH.CDC.Data_Race[-c(1794:1931), ]
MCH.CDC.Data_Race <- MCH.CDC.Data_Race[ ,-c(1,3,5,7,9,11)]

MCH.CDC.Data_Care <- read.delim("NatalityCARE.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data_Care <- MCH.CDC.Data_Care[-c(11063:11487), ]
MCH.CDC.Data_Care <- MCH.CDC.Data_Care[ ,-c(1,3,5,7,9, 11,13)]

#Rename Counties to Match Pesticide Data
MCH.CDC.Data[MCH.CDC.Data$County == "Alameda County, CA", "County"] <-"Alameda"
MCH.CDC.Data[MCH.CDC.Data$County == "Butte County, CA", "County"] <-"Butte"
MCH.CDC.Data[MCH.CDC.Data$County == "Contra Costa County, CA", "County"] <-"Contra Costa"
MCH.CDC.Data[MCH.CDC.Data$County == "El Dorado County, CA", "County"] <-"El Dorado"
MCH.CDC.Data[MCH.CDC.Data$County == "Fresno County, CA", "County"] <-"Fresno"
MCH.CDC.Data[MCH.CDC.Data$County == "Humboldt County, CA", "County"] <-"Humboldt"
MCH.CDC.Data[MCH.CDC.Data$County == "Imperial County, CA", "County"] <-"Imperial"
MCH.CDC.Data[MCH.CDC.Data$County == "Kern County, CA", "County"] <-"Kern"
MCH.CDC.Data[MCH.CDC.Data$County == "Kings County, CA", "County"] <-"Kings"
MCH.CDC.Data[MCH.CDC.Data$County == "Los Angeles County, CA", "County"] <-"Los Angeles"
MCH.CDC.Data[MCH.CDC.Data$County == "Madera County, CA", "County"] <-"Madera"
MCH.CDC.Data[MCH.CDC.Data$County == "Marin County, CA", "County"] <-"Marin"
MCH.CDC.Data[MCH.CDC.Data$County == "Contra Costa County, CA", "County"] <-"Mariposa"
MCH.CDC.Data[MCH.CDC.Data$County == "Merced County, CA", "County"] <-"Merced"
MCH.CDC.Data[MCH.CDC.Data$County == "Monterey County, CA", "County"] <-"Monterey"
MCH.CDC.Data[MCH.CDC.Data$County == "Napa County, CA", "County"] <-"Napa"
MCH.CDC.Data[MCH.CDC.Data$County == "Orange County, CA", "County"] <-"Orange"
MCH.CDC.Data[MCH.CDC.Data$County == "Placer County, CA", "County"] <-"Placer"
MCH.CDC.Data[MCH.CDC.Data$County == "Riverside County, CA", "County"] <-"Riverside"
MCH.CDC.Data[MCH.CDC.Data$County == "Sacramento County, CA", "County"] <-"Sacramento"
MCH.CDC.Data[MCH.CDC.Data$County == "San Bernardino County, CA", "County"] <-"San Bernardino"
MCH.CDC.Data[MCH.CDC.Data$County == "San Diego County, CA", "County"] <-"San Diego"
MCH.CDC.Data[MCH.CDC.Data$County == "San Francisco County, CA", "County"] <-"San Francisco"
MCH.CDC.Data[MCH.CDC.Data$County == "San Joaquin County, CA", "County"] <-"San Joaquin"
MCH.CDC.Data[MCH.CDC.Data$County == "San Luis Obispo County, CA", "County"] <-"San Luis Obispo"
MCH.CDC.Data[MCH.CDC.Data$County == "San Mateo County, CA", "County"] <-"San Mateo"
MCH.CDC.Data[MCH.CDC.Data$County == "Santa Barbara County, CA", "County"] <-"Santa Barbara"
MCH.CDC.Data[MCH.CDC.Data$County == "Santa Clara County, CA", "County"] <-"Canta Clara"
MCH.CDC.Data[MCH.CDC.Data$County == "Santa Cruz County, CA", "County"] <-"Santa Cruz"
MCH.CDC.Data[MCH.CDC.Data$County == "Shasta County, CA", "County"] <-"Shasta"
MCH.CDC.Data[MCH.CDC.Data$County == "Solano County, CA", "County"] <-"Solano"
MCH.CDC.Data[MCH.CDC.Data$County == "Sonoma County, CA", "County"] <-"Sonoma"
MCH.CDC.Data[MCH.CDC.Data$County == "Stanislaus County, CA", "County"] <-"Stanislaus"
MCH.CDC.Data[MCH.CDC.Data$County == "Tulare County, CA", "County"] <-"Tulare"
MCH.CDC.Data[MCH.CDC.Data$County == "Ventura County, CA", "County"] <-"Ventura"
MCH.CDC.Data[MCH.CDC.Data$County == "Yolo County, CA", "County"] <-"Yolo"

MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Alameda County, CA", "County"] <-"Alameda"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Butte County, CA", "County"] <-"Butte"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Contra Costa County, CA", "County"] <-"Contra Costa"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "El Dorado County, CA", "County"] <-"El Dorado"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Fresno County, CA", "County"] <-"Fresno"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Humboldt County, CA", "County"] <-"Humboldt"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Imperial County, CA", "County"] <-"Imperial"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Kern County, CA", "County"] <-"Kern"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Kings County, CA", "County"] <-"Kings"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Los Angeles County, CA", "County"] <-"Los Angeles"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Madera County, CA", "County"] <-"Madera"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Marin County, CA", "County"] <-"Marin"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Contra Costa County, CA", "County"] <-"Mariposa"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Merced County, CA", "County"] <-"Merced"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Monterey County, CA", "County"] <-"Monterey"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Napa County, CA", "County"] <-"Napa"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Orange County, CA", "County"] <-"Orange"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Placer County, CA", "County"] <-"Placer"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Riverside County, CA", "County"] <-"Riverside"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Sacramento County, CA", "County"] <-"Sacramento"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Bernardino County, CA", "County"] <-"San Bernardino"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Diego County, CA", "County"] <-"San Diego"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Francisco County, CA", "County"] <-"San Francisco"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Joaquin County, CA", "County"] <-"San Joaquin"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Luis Obispo County, CA", "County"] <-"San Luis Obispo"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Mateo County, CA", "County"] <-"San Mateo"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Santa Barbara County, CA", "County"] <-"Santa Barbara"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Santa Clara County, CA", "County"] <-"Santa Clara"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Santa Cruz County, CA", "County"] <-"Santa Cruz"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Shasta County, CA", "County"] <-"Shasta"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Solano County, CA", "County"] <-"Solano"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Sonoma County, CA", "County"] <-"Sonoma"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Stanislaus County, CA", "County"] <-"Stanislaus"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Tulare County, CA", "County"] <-"Tulare"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Ventura County, CA", "County"] <-"Ventura"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Yolo County, CA", "County"] <-"Yolo"

MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Alameda County, CA", "County"] <-"Alameda"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Butte County, CA", "County"] <-"Butte"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Contra Costa County, CA", "County"] <-"Contra Costa"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "El Dorado County, CA", "County"] <-"El Dorado"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Fresno County, CA", "County"] <-"Fresno"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Humboldt County, CA", "County"] <-"Humboldt"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Imperial County, CA", "County"] <-"Imperial"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Kern County, CA", "County"] <-"Kern"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Kings County, CA", "County"] <-"Kings"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Los Angeles County, CA", "County"] <-"Los Angeles"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Madera County, CA", "County"] <-"Madera"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Marin County, CA", "County"] <-"Marin"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Contra Costa County, CA", "County"] <-"Mariposa"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Merced County, CA", "County"] <-"Merced"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Monterey County, CA", "County"] <-"Monterey"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Napa County, CA", "County"] <-"Napa"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Orange County, CA", "County"] <-"Orange"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Placer County, CA", "County"] <-"Placer"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Riverside County, CA", "County"] <-"Riverside"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Sacramento County, CA", "County"] <-"Sacramento"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "San Bernardino County, CA", "County"] <-"San Bernardino"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "San Diego County, CA", "County"] <-"San Diego"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "San Francisco County, CA", "County"] <-"San Francisco"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "San Joaquin County, CA", "County"] <-"San Joaquin"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "San Luis Obispo County, CA", "County"] <-"San Luis Obispo"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "San Mateo County, CA", "County"] <-"San Mateo"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Santa Barbara County, CA", "County"] <-"Santa Barbara"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Santa Clara County, CA", "County"] <-"Santa Clara"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Santa Cruz County, CA", "County"] <-"Santa Cruz"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Shasta County, CA", "County"] <-"Shasta"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Solano County, CA", "County"] <-"Solano"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Sonoma County, CA", "County"] <-"Sonoma"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Stanislaus County, CA", "County"] <-"Stanislaus"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Tulare County, CA", "County"] <-"Tulare"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Ventura County, CA", "County"] <-"Ventura"
MCH.CDC.Data_Care[MCH.CDC.Data_Care$County == "Yolo County, CA", "County"] <-"Yolo"

MCH.CDC.Data_Race<- MCH.CDC.Data_Race %>% rename("Mothers.Race" = "Mother.s.Bridged.Race")
MCH.CDC.Data_Care<- MCH.CDC.Data_Care %>% rename("Mothers.Race" = "Mother.s.Bridged.Race")

Data Key: ‘MCH.CDC.Data’ is general data by county ‘MCH.CDC.Data_Race’ is stratified by race ‘MCH.CDC.Data_Care’ is stratified by race & includes month prenatal care began

Variable Key: Birth Rate = The number of births divided by total population in the given year(s) [Per 1,000]

Lara’s Plots with CDC Data (Exploratory Analysis)

#Fertility Rate GGPLOTS
MCH.CDC.Data %>% 
    ggplot(aes(x = Year, y = County,  fill = Fertility.Rate)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Fertility Rate", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Fertility Rate by County") + 
    ylab("") + xlab("")

MCH.CDC.Data %>% group_by(County) %>% filter(County == "Fresno") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Fresno County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") 

MCH.CDC.Data %>% group_by(County) %>% filter(County == "Kern") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Kern County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") 

MCH.CDC.Data %>% group_by(County) %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in San Joaquin County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") 

MCH.CDC.Data %>% group_by(County) %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Los Angeles County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") 

#Grid Plots
p1 <- MCH.CDC.Data %>% group_by(County) %>% filter(County == "Fresno") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Fresno County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none")

p2 <- MCH.CDC.Data %>% group_by(County) %>% filter(County == "Kern") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Kern County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none")

p3 <-MCH.CDC.Data %>% group_by(County) %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in San Joaquin County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none")

p4 <-MCH.CDC.Data %>% group_by(County) %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Los Angeles County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none")

grid.arrange(p1, p2, p3, p4, bottom = "Data Source: CDC WONDER Online Database")

#Race and Fertility GGPLOTS
MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(x = Year, y = County,  fill = Fertility.Rate)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Fertility Rate", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Fertility Rate by County for American Indian or Alaska Native Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(x = Year, y = County,  fill = Fertility.Rate)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Fertility Rate", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Fertility Rate by County for Asian or Pacific Islander Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(x = Year, y = County,  fill = Fertility.Rate)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Fertility Rate", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Fertility Rate by County for Black or African American Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
ggplot(aes(x = Year, y = County,  fill = Fertility.Rate)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Fertility Rate", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Fertility Rate by County for White Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(County == "Fresno") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Fresno County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")

MCH.CDC.Data_Race %>% filter(County == "Kern") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Kern County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")

MCH.CDC.Data_Race %>% filter(County == "San Bernardino") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in San Bernardino County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")

MCH.CDC.Data_Race %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Los Angeles County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")

MCH.CDC.Data_Race %>% filter(County == "San Diego") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in San Diego County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")

MCH.CDC.Data_Race %>% filter(County == "Tulare") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Tulare County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")

#Racial Demographic GGPLOTS
MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
    ggplot(aes(x = Year, y = County,  fill = Total.Population)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("American Indian or Alaska Native Population", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Racial Demographics by County") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
    ggplot(aes(x = Year, y = County,  fill = Total.Population)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Asian or Pacific Islander", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Racial Demographics by County") + 
    ylab("") + xlab("")

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
    ggplot(aes(x = Year, y = County,  fill = Total.Population)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Black or African American Population", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Racial Demographics by County") + 
    ylab("") + xlab("")

MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
    ggplot(aes(x = Year, y = County,  fill = Total.Population)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("White Population", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Racial Demographics by County") + 
    ylab("") + xlab("")

MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Fresno") %>% ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() + scale_y_log10() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Fresno County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") 

MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Fresno") %>% ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() + scale_y_log10() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Fresno County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") 

MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Kern") %>%  ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line()  + scale_y_log10() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Kern County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") 

MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "San Joaquin") %>%  ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line()  + scale_y_log10() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in San Joaquin County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") 

MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Los Angeles") %>%  ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line()  + scale_y_log10() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Los Angeles County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  

MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Tulare") %>%  ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line()  + scale_y_log10() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Tulare County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  

#Preterm Birth GGPLOTS (With Race)
MCH.CDC.Data %>% 
    ggplot(aes(x = Year, y = County,  fill = Average.LMP.Gestational.Age)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Gestational Age by County") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(x = Year, y = County,  fill = Average.LMP.Gestational.Age)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Gestational Age by County for American Indian or Alaska Native Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(x = Year, y = County,  fill = Average.LMP.Gestational.Age)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Gestational Age by County for Asian or Pacific Islander Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(x = Year, y = County,  fill = Average.LMP.Gestational.Age)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Gestational Age by County for Black or African American Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
ggplot(aes(x = Year, y = County,  fill = Average.LMP.Gestational.Age)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Gestational Age by County for White Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(County == "Fresno") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Fresno County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 37, size =2)

MCH.CDC.Data_Race %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in San Joaquin County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 37, size =2)

MCH.CDC.Data_Race %>% filter(County == "Kern") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Kern County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 37, size =2)

MCH.CDC.Data_Race %>% filter(County == "San Bernardino") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in San Bernardino County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 37, size =2)

MCH.CDC.Data_Race %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Los Angeles County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 37, size =2)

MCH.CDC.Data_Race %>% filter(County == "San Diego") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in San Diego", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 37, size =2)

MCH.CDC.Data_Race %>% filter(County == "Tulare") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Tulare", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 37, size =2)

#Preterm Birth Cutoff is 37 Weeks (Horizontal Line)
#Birth weight GGPLOTS (With Race)
MCH.CDC.Data %>% 
    ggplot(aes(x = Year, y = County,  fill = Average.Birth.Weight)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Birth Weight in Grams", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Birth Weight by County") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(x = Year, y = County,  fill = Average.Birth.Weight)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Birth Weight in Grams", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Birth Weight by County for American Indian or Alaska Native Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(x = Year, y = County,  fill = Average.Birth.Weight)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Birth Weight in Grams", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Birth Weight by County for Asian or Pacific Islander Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(x = Year, y = County,  fill = Average.Birth.Weight)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Birth Weight Grams", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Birth Weight by County for Black or African American Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
ggplot(aes(x = Year, y = County,  fill = Average.Birth.Weight)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Birth Weight in Grams", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Birth Weight by County for White Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(County == "Fresno") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Fresno County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 2500, size =2)

MCH.CDC.Data_Race %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in San Joaquin County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 2500, size =2)

MCH.CDC.Data_Race %>% filter(County == "Kern") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Kern County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 2500, size =2)

MCH.CDC.Data_Race %>% filter(County == "San Bernardino") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in San Bernardino County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 2500, size =2)

MCH.CDC.Data_Race %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Los Angeles County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 2500, size =2)

MCH.CDC.Data_Race %>% filter(County == "San Diego") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in San Diego", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 2500, size =2)

MCH.CDC.Data_Race %>% filter(County == "Tulare") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Tulare", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 2500, size =2)

#LBW Cutoff is 2500 Grams (Horizontal Line)

Pesticide Data Wrangling

The

county_ranks16 <- read_delim("table1_county_rank_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   COUNTY = col_character(),
##   LBS_2015 = col_double(),
##   RANK_2015 = col_double(),
##   LBS_2016 = col_double(),
##   RANK_2016 = col_double()
## )
repro_lbs16 <- read_delim("table3_reproductive_lbs_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   CHEMICAL = col_character(),
##   LBS_2007 = col_double(),
##   LBS_2008 = col_double(),
##   LBS_2009 = col_double(),
##   LBS_2010 = col_double(),
##   LBS_2011 = col_double(),
##   LBS_2012 = col_double(),
##   LBS_2013 = col_double(),
##   LBS_2014 = col_double(),
##   LBS_2015 = col_double(),
##   LBS_2016 = col_double()
## )
repro_acre16 <- read_delim("table4_reproductive_acres_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   CHEMNAME = col_character(),
##   ACRES_2007 = col_double(),
##   ACRES_2008 = col_double(),
##   ACRES_2009 = col_double(),
##   ACRES_2010 = col_double(),
##   ACRES_2011 = col_double(),
##   ACRES_2012 = col_double(),
##   ACRES_2013 = col_double(),
##   ACRES_2014 = col_double(),
##   ACRES_2015 = col_double(),
##   ACRES_2016 = col_double()
## )
table1_2016 <- county_ranks16 %>% transmute(county = COUNTY, 
                                     lbs_2015 = LBS_2015, rank_2015 = RANK_2015, 
                                     lbs_2016 = LBS_2016, rank_2016 = RANK_2016)
# column 1 is the county
# columns 2-3 have the previous year data
# columns 4-5 have the current year data

# we only want columns 1-3 for the most up-to-date data for all years before 2016
all_dat <- list(read_csv("table1_2007.csv")[1:3],
                read_csv("table1_2008.csv")[1:3],
                read_csv("table1_2009.csv")[1:3],
                read_csv("table1_2010.csv")[1:3],
                read_csv("table1_2011.csv")[1:3],
                read_csv("table1_2012.csv")[1:3],
                read_csv("table1_2013.csv")[1:3],
                read_csv("table1_2014.csv")[1:3],
                read_csv("table1_2015.csv")[1:3])
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2006 = col_double(),
##   rank_2006 = col_double(),
##   lbs_2007 = col_double(),
##   rank_2007 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2007 = col_double(),
##   rank_2007 = col_double(),
##   lbs_2008 = col_double(),
##   rank_2008 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2008 = col_double(),
##   rank_2008 = col_double(),
##   lbs_2009 = col_double(),
##   rank_2009 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2009 = col_double(),
##   rank_2009 = col_double(),
##   lbs_2010 = col_double(),
##   rank_2010 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2010 = col_double(),
##   rank_2010 = col_double(),
##   lbs_2011 = col_double(),
##   rank_2011 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2011 = col_double(),
##   rank_2011 = col_double(),
##   lbs_2012 = col_double(),
##   rank_2012 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2012 = col_double(),
##   rank_2012 = col_double(),
##   lbs_2013 = col_double(),
##   rank_2013 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2013 = col_double(),
##   rank_2013 = col_double(),
##   lbs_2014 = col_double(),
##   rank_2014 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2014 = col_double(),
##   rank_2014 = col_double(),
##   lbs_2015 = col_double(),
##   rank_2015 = col_double()
## )
table1 <- Reduce(function(x, y) left_join(x, y, by = "county"), all_dat)


long_table1 <- table1 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
#table1_ranks <- long_table1 %>% filter(str_starts(usage, "rank"))
table1_lbs <- long_table1 %>% filter(str_starts(usage, "lbs"))

long_table2 <- table1_2016 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
#table1_ranks_1516 <- long_table2 %>% filter(str_starts(usage, "rank"))
table1_lbs_1516<- long_table2 %>% filter(str_starts(usage, "lbs"))

table1_lbs$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs$usage))
table1_lbs_1516$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs_1516$usage))
combined_pesticide_use <- table1_lbs %>% full_join(table1_lbs_1516) 
## Joining, by = c("county", "usage", "value")
class(combined_pesticide_use$usage) 
## [1] "numeric"
combined_pesticide_use <- combined_pesticide_use %>% group_by(usage) 
combined_pesticide_use <- combined_pesticide_use %>% arrange(usage)

Sonia’s Work with LBW Data: This is my data wrangling process for low birth weight for the CDC WONDER database. By default, CDC WONDER live birth database only displayed counties that had a county population >100,000. I only looked at low birth rate here and this is for my shiny app bar graph.

#MCH_CDC Data for low birth weight 
#data wrangling mch cdc data
cdc_lowbirthweight <- read.delim("MCH CDC Data.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
cdc_lowbirthweight  <- cdc_lowbirthweight [-c(482:538), ]
cdc_lowbirthweight  <- cdc_lowbirthweight [ ,-c(1, 3, 5, 7)]
MCH.CDC.Data.Total <- read.delim("MCH CDC Data Total.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data.Total <- MCH.CDC.Data.Total[,-c(1, 3, 5)]
MCH.CDC.Data.Total %>% rename("Total Birth" = "Births")
##     Year                     County Total Birth
## 1   2007         Alameda County, CA       21522
## 2   2007           Butte County, CA        2523
## 3   2007    Contra Costa County, CA       13487
## 4   2007       El Dorado County, CA        1882
## 5   2007          Fresno County, CA       17292
## 6   2007        Humboldt County, CA        1599
## 7   2007        Imperial County, CA        3146
## 8   2007            Kern County, CA       15336
## 9   2007           Kings County, CA        2781
## 10  2007     Los Angeles County, CA      151908
## 11  2007          Madera County, CA        2612
## 12  2007           Marin County, CA        2820
## 13  2007          Merced County, CA        4652
## 14  2007        Monterey County, CA        7551
## 15  2007            Napa County, CA        1665
## 16  2007          Orange County, CA       44038
## 17  2007          Placer County, CA        4054
## 18  2007       Riverside County, CA       34563
## 19  2007      Sacramento County, CA       22119
## 20  2007  San Bernardino County, CA       35190
## 21  2007       San Diego County, CA       47569
## 22  2007   San Francisco County, CA        9129
## 23  2007     San Joaquin County, CA       11600
## 24  2007 San Luis Obispo County, CA        2884
## 25  2007       San Mateo County, CA        9914
## 26  2007   Santa Barbara County, CA        6292
## 27  2007     Santa Clara County, CA       27490
## 28  2007      Santa Cruz County, CA        3571
## 29  2007          Shasta County, CA        2230
## 30  2007          Solano County, CA        5849
## 31  2007          Sonoma County, CA        5742
## 32  2007      Stanislaus County, CA        8827
## 33  2007          Tulare County, CA        8507
## 34  2007         Ventura County, CA       12198
## 35  2007            Yolo County, CA        2522
## 36  2007  Unidentified Counties, CA       11350
## 37  2007                                 566414
## 38  2008         Alameda County, CA       20976
## 39  2008           Butte County, CA        2520
## 40  2008    Contra Costa County, CA       13135
## 41  2008       El Dorado County, CA        1814
## 42  2008          Fresno County, CA       16764
## 43  2008        Humboldt County, CA        1601
## 44  2008        Imperial County, CA        3241
## 45  2008            Kern County, CA       15316
## 46  2008           Kings County, CA        2711
## 47  2008     Los Angeles County, CA      147745
## 48  2008          Madera County, CA        2535
## 49  2008           Marin County, CA        2719
## 50  2008          Merced County, CA        4422
## 51  2008        Monterey County, CA        7435
## 52  2008            Napa County, CA        1671
## 53  2008          Orange County, CA       42467
## 54  2008          Placer County, CA        4035
## 55  2008       Riverside County, CA       32881
## 56  2008      Sacramento County, CA       21397
## 57  2008  San Bernardino County, CA       33837
## 58  2008       San Diego County, CA       46755
## 59  2008   San Francisco County, CA        9106
## 60  2008     San Joaquin County, CA       11030
## 61  2008 San Luis Obispo County, CA        2739
## 62  2008       San Mateo County, CA        9770
## 63  2008   Santa Barbara County, CA        6320
## 64  2008     Santa Clara County, CA       26731
## 65  2008      Santa Cruz County, CA        3537
## 66  2008          Shasta County, CA        2186
## 67  2008          Solano County, CA        5609
## 68  2008          Sonoma County, CA        5763
## 69  2008      Stanislaus County, CA        8550
## 70  2008          Tulare County, CA        8535
## 71  2008         Ventura County, CA       12075
## 72  2008            Yolo County, CA        2669
## 73  2008  Unidentified Counties, CA       11182
## 74  2008                                 551779
## 75  2009         Alameda County, CA       20325
## 76  2009           Butte County, CA        2440
## 77  2009    Contra Costa County, CA       12686
## 78  2009       El Dorado County, CA        1727
## 79  2009          Fresno County, CA       16271
## 80  2009        Humboldt County, CA        1541
## 81  2009        Imperial County, CA        3151
## 82  2009            Kern County, CA       14828
## 83  2009           Kings County, CA        2645
## 84  2009     Los Angeles County, CA      139757
## 85  2009          Madera County, CA        2390
## 86  2009           Marin County, CA        2496
## 87  2009          Merced County, CA        4407
## 88  2009        Monterey County, CA        7070
## 89  2009            Napa County, CA        1653
## 90  2009          Orange County, CA       40437
## 91  2009          Placer County, CA        3810
## 92  2009       Riverside County, CA       31605
## 93  2009      Sacramento County, CA       20433
## 94  2009  San Bernardino County, CA       32006
## 95  2009       San Diego County, CA       44982
## 96  2009   San Francisco County, CA        8810
## 97  2009     San Joaquin County, CA       10876
## 98  2009 San Luis Obispo County, CA        2617
## 99  2009       San Mateo County, CA        9452
## 100 2009   Santa Barbara County, CA        6041
## 101 2009     Santa Clara County, CA       25203
## 102 2009      Santa Cruz County, CA        3299
## 103 2009          Shasta County, CA        2068
## 104 2009          Solano County, CA        5393
## 105 2009          Sonoma County, CA        5685
## 106 2009      Stanislaus County, CA        7942
## 107 2009          Tulare County, CA        8361
## 108 2009         Ventura County, CA       11360
## 109 2009            Yolo County, CA        2483
## 110 2009  Unidentified Counties, CA       10770
## 111 2009                                 527020
## 112 2010         Alameda County, CA       19306
## 113 2010           Butte County, CA        2457
## 114 2010    Contra Costa County, CA       12358
## 115 2010       El Dorado County, CA        1621
## 116 2010          Fresno County, CA       16283
## 117 2010        Humboldt County, CA        1551
## 118 2010        Imperial County, CA        3081
## 119 2010            Kern County, CA       14419
## 120 2010           Kings County, CA        2509
## 121 2010     Los Angeles County, CA      133252
## 122 2010          Madera County, CA        2434
## 123 2010           Marin County, CA        2371
## 124 2010          Merced County, CA        4249
## 125 2010        Monterey County, CA        6765
## 126 2010            Napa County, CA        1525
## 127 2010          Orange County, CA       38250
## 128 2010          Placer County, CA        3825
## 129 2010       Riverside County, CA       30670
## 130 2010      Sacramento County, CA       20056
## 131 2010  San Bernardino County, CA       31368
## 132 2010       San Diego County, CA       44867
## 133 2010   San Francisco County, CA        8806
## 134 2010     San Joaquin County, CA       10596
## 135 2010 San Luis Obispo County, CA        2735
## 136 2010       San Mateo County, CA        9194
## 137 2010   Santa Barbara County, CA        5821
## 138 2010     Santa Clara County, CA       23940
## 139 2010      Santa Cruz County, CA        3192
## 140 2010          Shasta County, CA        2136
## 141 2010          Solano County, CA        5050
## 142 2010          Sonoma County, CA        5393
## 143 2010      Stanislaus County, CA        7806
## 144 2010          Tulare County, CA        8155
## 145 2010         Ventura County, CA       11150
## 146 2010            Yolo County, CA        2427
## 147 2010  Unidentified Counties, CA       10580
## 148 2010                                 510198
## 149 2011         Alameda County, CA       19003
## 150 2011           Butte County, CA        2391
## 151 2011    Contra Costa County, CA       12060
## 152 2011       El Dorado County, CA        1630
## 153 2011          Fresno County, CA       16160
## 154 2011        Humboldt County, CA        1448
## 155 2011        Imperial County, CA        3079
## 156 2011            Kern County, CA       14287
## 157 2011           Kings County, CA        2567
## 158 2011     Los Angeles County, CA      130370
## 159 2011          Madera County, CA        2401
## 160 2011           Marin County, CA        2386
## 161 2011          Merced County, CA        4280
## 162 2011        Monterey County, CA        6812
## 163 2011            Napa County, CA        1572
## 164 2011          Orange County, CA       38101
## 165 2011          Placer County, CA        3834
## 166 2011       Riverside County, CA       30611
## 167 2011      Sacramento County, CA       20002
## 168 2011  San Bernardino County, CA       30566
## 169 2011       San Diego County, CA       43643
## 170 2011   San Francisco County, CA        8813
## 171 2011     San Joaquin County, CA       10329
## 172 2011 San Luis Obispo County, CA        2631
## 173 2011       San Mateo County, CA        9048
## 174 2011   Santa Barbara County, CA        5804
## 175 2011     Santa Clara County, CA       23649
## 176 2011      Santa Cruz County, CA        3233
## 177 2011          Shasta County, CA        2022
## 178 2011          Solano County, CA        5160
## 179 2011          Sonoma County, CA        5150
## 180 2011      Stanislaus County, CA        7738
## 181 2011          Tulare County, CA        7966
## 182 2011         Ventura County, CA       10656
## 183 2011            Yolo County, CA        2341
## 184 2011  Unidentified Counties, CA       10377
## 185 2011                                 502120
## 186 2012         Alameda County, CA       19546
## 187 2012           Butte County, CA        2399
## 188 2012    Contra Costa County, CA       12065
## 189 2012       El Dorado County, CA        1513
## 190 2012          Fresno County, CA       15955
## 191 2012        Humboldt County, CA        1504
## 192 2012        Imperial County, CA        3054
## 193 2012            Kern County, CA       14553
## 194 2012           Kings County, CA        2358
## 195 2012     Los Angeles County, CA      131664
## 196 2012          Madera County, CA        2257
## 197 2012           Marin County, CA        2305
## 198 2012          Merced County, CA        4312
## 199 2012        Monterey County, CA        6652
## 200 2012            Napa County, CA        1431
## 201 2012          Orange County, CA       38183
## 202 2012          Placer County, CA        3648
## 203 2012       Riverside County, CA       30300
## 204 2012      Sacramento County, CA       19623
## 205 2012  San Bernardino County, CA       30701
## 206 2012       San Diego County, CA       44396
## 207 2012   San Francisco County, CA        9075
## 208 2012     San Joaquin County, CA       10129
## 209 2012 San Luis Obispo County, CA        2580
## 210 2012       San Mateo County, CA        9185
## 211 2012   Santa Barbara County, CA        5585
## 212 2012     Santa Clara County, CA       24308
## 213 2012      Santa Cruz County, CA        3083
## 214 2012          Shasta County, CA        2109
## 215 2012          Solano County, CA        5062
## 216 2012          Sonoma County, CA        5143
## 217 2012      Stanislaus County, CA        7591
## 218 2012          Tulare County, CA        8000
## 219 2012         Ventura County, CA       10641
## 220 2012            Yolo County, CA        2451
## 221 2012  Unidentified Counties, CA       10394
## 222 2012                                 503755
## 223 2013         Alameda County, CA       19257
## 224 2013           Butte County, CA        2415
## 225 2013    Contra Costa County, CA       12154
## 226 2013       El Dorado County, CA        1534
## 227 2013          Fresno County, CA       15737
## 228 2013        Humboldt County, CA        1531
## 229 2013        Imperial County, CA        3094
## 230 2013            Kern County, CA       14149
## 231 2013           Kings County, CA        2394
## 232 2013     Los Angeles County, CA      128598
## 233 2013          Madera County, CA        2315
## 234 2013           Marin County, CA        2321
## 235 2013          Merced County, CA        4162
## 236 2013        Monterey County, CA        6547
## 237 2013            Napa County, CA        1450
## 238 2013          Orange County, CA       37281
## 239 2013          Placer County, CA        3688
## 240 2013       Riverside County, CA       29941
## 241 2013      Sacramento County, CA       19371
## 242 2013  San Bernardino County, CA       30246
## 243 2013       San Diego County, CA       43659
## 244 2013   San Francisco County, CA        8814
## 245 2013     San Joaquin County, CA        9800
## 246 2013 San Luis Obispo County, CA        2650
## 247 2013       San Mateo County, CA        8824
## 248 2013   Santa Barbara County, CA        5755
## 249 2013     Santa Clara County, CA       23313
## 250 2013      Santa Cruz County, CA        2871
## 251 2013          Shasta County, CA        2143
## 252 2013          Solano County, CA        5259
## 253 2013          Sonoma County, CA        4983
## 254 2013      Stanislaus County, CA        7579
## 255 2013          Tulare County, CA        7653
## 256 2013         Ventura County, CA       10446
## 257 2013            Yolo County, CA        2491
## 258 2013  Unidentified Counties, CA       10280
## 259 2013                                 494705
## 260 2014         Alameda County, CA       19650
## 261 2014           Butte County, CA        2481
## 262 2014    Contra Costa County, CA       12557
## 263 2014       El Dorado County, CA        1618
## 264 2014          Fresno County, CA       15762
## 265 2014        Humboldt County, CA        1468
## 266 2014        Imperial County, CA        3226
## 267 2014            Kern County, CA       14193
## 268 2014           Kings County, CA        2350
## 269 2014     Los Angeles County, CA      130289
## 270 2014          Madera County, CA        2313
## 271 2014           Marin County, CA        2401
## 272 2014          Merced County, CA        4164
## 273 2014        Monterey County, CA        6455
## 274 2014            Napa County, CA        1475
## 275 2014          Orange County, CA       38595
## 276 2014          Placer County, CA        3631
## 277 2014       Riverside County, CA       30235
## 278 2014      Sacramento County, CA       19871
## 279 2014  San Bernardino County, CA       31226
## 280 2014       San Diego County, CA       44596
## 281 2014   San Francisco County, CA        9104
## 282 2014     San Joaquin County, CA       10113
## 283 2014 San Luis Obispo County, CA        2596
## 284 2014       San Mateo County, CA        9083
## 285 2014   Santa Barbara County, CA        5830
## 286 2014     Santa Clara County, CA       23742
## 287 2014      Santa Cruz County, CA        3069
## 288 2014          Shasta County, CA        2083
## 289 2014          Solano County, CA        5253
## 290 2014          Sonoma County, CA        5070
## 291 2014      Stanislaus County, CA        7511
## 292 2014          Tulare County, CA        7640
## 293 2014         Ventura County, CA       10468
## 294 2014            Yolo County, CA        2394
## 295 2014  Unidentified Counties, CA       10367
## 296 2014                                 502879
## 297 2015         Alameda County, CA       19434
## 298 2015           Butte County, CA        2442
## 299 2015    Contra Costa County, CA       12596
## 300 2015       El Dorado County, CA        1594
## 301 2015          Fresno County, CA       15359
## 302 2015        Humboldt County, CA        1441
## 303 2015        Imperial County, CA        3168
## 304 2015            Kern County, CA       13768
## 305 2015           Kings County, CA        2274
## 306 2015     Los Angeles County, CA      124641
## 307 2015          Madera County, CA        2225
## 308 2015           Marin County, CA        2288
## 309 2015          Merced County, CA        4104
## 310 2015        Monterey County, CA        6420
## 311 2015            Napa County, CA        1457
## 312 2015          Orange County, CA       37609
## 313 2015          Placer County, CA        3747
## 314 2015       Riverside County, CA       30491
## 315 2015      Sacramento County, CA       19423
## 316 2015  San Bernardino County, CA       30530
## 317 2015       San Diego County, CA       43942
## 318 2015   San Francisco County, CA        8972
## 319 2015     San Joaquin County, CA        9983
## 320 2015 San Luis Obispo County, CA        2668
## 321 2015       San Mateo County, CA        9037
## 322 2015   Santa Barbara County, CA        5673
## 323 2015     Santa Clara County, CA       23393
## 324 2015      Santa Cruz County, CA        2840
## 325 2015          Shasta County, CA        2073
## 326 2015          Solano County, CA        5131
## 327 2015          Sonoma County, CA        5015
## 328 2015      Stanislaus County, CA        7698
## 329 2015          Tulare County, CA        7411
## 330 2015         Ventura County, CA       10060
## 331 2015            Yolo County, CA        2402
## 332 2015  Unidentified Counties, CA       10439
## 333 2015                                 491748
## 334 2016         Alameda County, CA       19573
## 335 2016           Butte County, CA        2490
## 336 2016    Contra Costa County, CA       12340
## 337 2016       El Dorado County, CA        1601
## 338 2016          Fresno County, CA       15129
## 339 2016        Humboldt County, CA        1482
## 340 2016        Imperial County, CA        2939
## 341 2016            Kern County, CA       13728
## 342 2016           Kings County, CA        2248
## 343 2016     Los Angeles County, CA      123092
## 344 2016          Madera County, CA        2355
## 345 2016           Marin County, CA        2252
## 346 2016          Merced County, CA        4117
## 347 2016        Monterey County, CA        6219
## 348 2016            Napa County, CA        1406
## 349 2016          Orange County, CA       38106
## 350 2016          Placer County, CA        3732
## 351 2016       Riverside County, CA       30661
## 352 2016      Sacramento County, CA       19588
## 353 2016  San Bernardino County, CA       31032
## 354 2016       San Diego County, CA       42720
## 355 2016   San Francisco County, CA        9062
## 356 2016     San Joaquin County, CA       10268
## 357 2016 San Luis Obispo County, CA        2581
## 358 2016       San Mateo County, CA        8960
## 359 2016   Santa Barbara County, CA        5501
## 360 2016     Santa Clara County, CA       23042
## 361 2016      Santa Cruz County, CA        2799
## 362 2016          Shasta County, CA        2048
## 363 2016          Solano County, CA        5259
## 364 2016          Sonoma County, CA        4962
## 365 2016      Stanislaus County, CA        7862
## 366 2016          Tulare County, CA        7146
## 367 2016         Ventura County, CA        9592
## 368 2016            Yolo County, CA        2423
## 369 2016  Unidentified Counties, CA       10512
## 370 2016                                 488827
## 371 2017         Alameda County, CA       18888
## 372 2017           Butte County, CA        2386
## 373 2017    Contra Costa County, CA       12180
## 374 2017       El Dorado County, CA        1570
## 375 2017          Fresno County, CA       14541
## 376 2017        Humboldt County, CA        1372
## 377 2017        Imperial County, CA        2894
## 378 2017            Kern County, CA       13326
## 379 2017           Kings County, CA        2373
## 380 2017     Los Angeles County, CA      116950
## 381 2017          Madera County, CA        2120
## 382 2017           Marin County, CA        2237
## 383 2017          Merced County, CA        4202
## 384 2017        Monterey County, CA        5810
## 385 2017            Napa County, CA        1291
## 386 2017          Orange County, CA       37369
## 387 2017          Placer County, CA        3689
## 388 2017       Riverside County, CA       29857
## 389 2017      Sacramento County, CA       19202
## 390 2017  San Bernardino County, CA       29643
## 391 2017       San Diego County, CA       41230
## 392 2017   San Francisco County, CA        8947
## 393 2017     San Joaquin County, CA        9928
## 394 2017 San Luis Obispo County, CA        2550
## 395 2017       San Mateo County, CA        8585
## 396 2017   Santa Barbara County, CA        5531
## 397 2017     Santa Clara County, CA       22133
## 398 2017      Santa Cruz County, CA        2658
## 399 2017          Shasta County, CA        2008
## 400 2017          Solano County, CA        5131
## 401 2017          Sonoma County, CA        4642
## 402 2017      Stanislaus County, CA        7441
## 403 2017          Tulare County, CA        7130
## 404 2017         Ventura County, CA        9318
## 405 2017            Yolo County, CA        2272
## 406 2017  Unidentified Counties, CA       10254
## 407 2017                                 471658
## 408 2018         Alameda County, CA       18240
## 409 2018           Butte County, CA        2430
## 410 2018    Contra Costa County, CA       12002
## 411 2018       El Dorado County, CA        1674
## 412 2018          Fresno County, CA       14465
## 413 2018        Humboldt County, CA        1364
## 414 2018        Imperial County, CA        2629
## 415 2018            Kern County, CA       12916
## 416 2018           Kings County, CA        2262
## 417 2018     Los Angeles County, CA      110271
## 418 2018          Madera County, CA        2079
## 419 2018           Marin County, CA        2127
## 420 2018          Merced County, CA        3875
## 421 2018        Monterey County, CA        5895
## 422 2018            Napa County, CA        1204
## 423 2018          Orange County, CA       35679
## 424 2018          Placer County, CA        3663
## 425 2018       Riverside County, CA       28725
## 426 2018      Sacramento County, CA       19102
## 427 2018  San Bernardino County, CA       28994
## 428 2018       San Diego County, CA       40070
## 429 2018   San Francisco County, CA        8697
## 430 2018     San Joaquin County, CA        9841
## 431 2018 San Luis Obispo County, CA        2445
## 432 2018       San Mateo County, CA        8330
## 433 2018   Santa Barbara County, CA        5268
## 434 2018     Santa Clara County, CA       21292
## 435 2018      Santa Cruz County, CA        2449
## 436 2018          Shasta County, CA        1966
## 437 2018          Solano County, CA        5033
## 438 2018          Sonoma County, CA        4526
## 439 2018      Stanislaus County, CA        7364
## 440 2018          Tulare County, CA        6905
## 441 2018         Ventura County, CA        9065
## 442 2018            Yolo County, CA        2135
## 443 2018  Unidentified Counties, CA        9938
## 444 2018                                 454920
## 445 2019         Alameda County, CA       18212
## 446 2019           Butte County, CA        2154
## 447 2019    Contra Costa County, CA       11729
## 448 2019       El Dorado County, CA        1524
## 449 2019          Fresno County, CA       14057
## 450 2019        Humboldt County, CA        1417
## 451 2019        Imperial County, CA        2533
## 452 2019            Kern County, CA       12765
## 453 2019           Kings County, CA        2115
## 454 2019     Los Angeles County, CA      107231
## 455 2019          Madera County, CA        2045
## 456 2019           Marin County, CA        2071
## 457 2019          Merced County, CA        3806
## 458 2019        Monterey County, CA        5846
## 459 2019            Napa County, CA        1279
## 460 2019          Orange County, CA       35052
## 461 2019          Placer County, CA        3658
## 462 2019       Riverside County, CA       28026
## 463 2019      Sacramento County, CA       19089
## 464 2019  San Bernardino County, CA       28656
## 465 2019       San Diego County, CA       38540
## 466 2019   San Francisco County, CA        8438
## 467 2019     San Joaquin County, CA       10009
## 468 2019 San Luis Obispo County, CA        2447
## 469 2019       San Mateo County, CA        8206
## 470 2019   Santa Barbara County, CA        5537
## 471 2019     Santa Clara County, CA       21184
## 472 2019      Santa Cruz County, CA        2434
## 473 2019          Shasta County, CA        1903
## 474 2019          Solano County, CA        5065
## 475 2019          Sonoma County, CA        4395
## 476 2019      Stanislaus County, CA        7402
## 477 2019          Tulare County, CA        6714
## 478 2019         Ventura County, CA        8736
## 479 2019            Yolo County, CA        2057
## 480 2019  Unidentified Counties, CA       10147
## 481 2019                                 446479
## 482   NA                                6512502
## 483   NA                                     NA
## 484   NA                                     NA
## 485   NA                                     NA
## 486   NA                                     NA
## 487   NA                                     NA
## 488   NA                                     NA
## 489   NA                                     NA
## 490   NA                                     NA
## 491   NA                                     NA
## 492   NA                                     NA
## 493   NA                                     NA
## 494   NA                                     NA
## 495   NA                                     NA
## 496   NA                                     NA
## 497   NA                                     NA
## 498   NA                                     NA
## 499   NA                                     NA
## 500   NA                                     NA
## 501   NA                                     NA
## 502   NA                                     NA
## 503   NA                                     NA
## 504   NA                                     NA
## 505   NA                                     NA
## 506   NA                                     NA
## 507   NA                                     NA
## 508   NA                                     NA
## 509   NA                                     NA
## 510   NA                                     NA
## 511   NA                                     NA
## 512   NA                                     NA
## 513   NA                                     NA
## 514   NA                                     NA
## 515   NA                                     NA
## 516   NA                                     NA
## 517   NA                                     NA
## 518   NA                                     NA
## 519   NA                                     NA
## 520   NA                                     NA
## 521   NA                                     NA
## 522   NA                                     NA
## 523   NA                                     NA
#I noticed that the data I downloaded did not include total # of births so merging two datasets (one that has total # of birth counts and the other with low birth wegiht +very low birth weight counts)
df1 <- full_join(cdc_lowbirthweight , MCH.CDC.Data.Total, by=c("Year", "County"))
df1<- df1 %>% rename("cases" = "Births.x", "total_births" = "Births.y")

#Note: LBW = Low birth weight + Very low birth weight counts; Total Births = Total # of Birth
col_order <- c("Year", "County", "total_births",
               "cases", "Average.Birth.Weight", "Standard.Deviation.for.Average.Birth.Weight",
               "Average.Age.of.Mother", "Standard.Deviation.for.Average.Age.of.Mother","Average.LMP.Gestational.Age",
               "Standard.Deviation.for.Average.LMP.Gestational.Age")
df2 <- df1[,col_order]
df2[df2$County == "Alameda County, CA", "County"] <-"alameda"
df2[df2$County == "Butte County, CA", "County"] <-"butte"
df2[df2$County == "Contra Costa County, CA", "County"] <-"contra costa"
df2[df2$County == "El Dorado County, CA", "County"] <-"el dorado"
df2[df2$County == "Fresno County, CA", "County"] <-"fresno"
df2[df2$County == "Humboldt County, CA", "County"] <-"humboldt"
df2[df2$County == "Imperial County, CA", "County"] <-"imperial"
df2[df2$County == "Kern County, CA", "County"] <-"kern"
df2[df2$County == "Kings County, CA", "County"] <-"kings"
df2[df2$County == "Los Angeles County, CA", "County"] <-"los angeles"
df2[df2$County == "Madera County, CA", "County"] <-"madera"
df2[df2$County == "Marin County, CA", "County"] <-"marin"
df2[df2$County == "Contra Costa County, CA", "County"] <-"mariposa"
df2[df2$County == "Merced County, CA", "County"] <-"merced"
df2[df2$County == "Monterey County, CA", "County"] <-"monterey"
df2[df2$County == "Napa County, CA", "County"] <-"napa"
df2[df2$County == "Orange County, CA", "County"] <-"orange"
df2[df2$County == "Placer County, CA", "County"] <-"placer"
df2[df2$County == "Riverside County, CA", "County"] <-"riverside"
df2[df2$County == "Sacramento County, CA", "County"] <-"sacramento"
df2[df2$County == "San Bernardino County, CA", "County"] <-"san bernardino"
df2[df2$County == "San Diego County, CA", "County"] <-"san diego"
df2[df2$County == "San Francisco County, CA", "County"] <-"san francisco"
df2[df2$County == "San Joaquin County, CA", "County"] <-"san joaquin"
df2[df2$County == "San Luis Obispo County, CA", "County"] <-"san luis obispo"
df2[df2$County == "San Mateo County, CA", "County"] <-"san mateo"
df2[df2$County == "Santa Barbara County, CA", "County"] <-"santa barbara"
df2[df2$County == "Santa Clara County, CA", "County"] <-"santa clara"
df2[df2$County == "Santa Cruz County, CA", "County"] <-"santa cruz"
df2[df2$County == "Shasta County, CA", "County"] <-"shasta"
df2[df2$County == "Solano County, CA", "County"] <-"solano"
df2[df2$County == "Sonoma County, CA", "County"] <-"sonoma"
df2[df2$County == "Stanislaus County, CA", "County"] <-"stanislaus"
df2[df2$County == "Tulare County, CA", "County"] <-"tulare"
df2[df2$County == "Ventura County, CA", "County"] <-"ventura"
df2[df2$County == "Yolo County, CA", "County"] <-"yolo"
df2 <- df2 %>% filter(!is.na(total_births)) %>% filter(!is.na(cases)) %>% mutate(rate = cases/total_births * 10^2)
df2$County <- df2$County %>% str_to_title()

Sonia’s CDC WONDER Database Work With Preterm Birth: This is my data wrangling process for preterm birth for the CDC WONDER database. By default, CDC WONDER live birth database only displayed counties that had a county population >100,000. I only looked at preterm birth here and this is for my shiny app bar graph.

cdc_pretermbirth <- read.delim("Preterm birth.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
cdc_pretermbirth  <- cdc_pretermbirth [-c(422:472), ]
cdc_pretermbirth  <- cdc_pretermbirth [ ,-c(1, 3, 5)]
cdc_pretermbirth <- cdc_pretermbirth %>% rename("Events" = "Births")
MCH.CDC.Data.Total <- read.delim("MCH CDC Data Total.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data.Total <- MCH.CDC.Data.Total[,-c(1, 3, 5)]
MCH.CDC.Data.Total <- MCH.CDC.Data.Total %>% rename("total_birth" = "Births")

df1_pt <- full_join(cdc_pretermbirth , MCH.CDC.Data.Total, by=c("Year", "County"))

df1_pt[df1_pt$County == "Alameda County, CA", "County"] <-"alameda"
df1_pt[df1_pt$County == "Butte County, CA", "County"] <-"butte"
df1_pt[df1_pt$County == "Contra Costa County, CA", "County"] <-"contra costa"
df1_pt[df1_pt$County == "El Dorado County, CA", "County"] <-"el dorado"
df1_pt[df1_pt$County == "Fresno County, CA", "County"] <-"fresno"
df1_pt[df1_pt$County == "Humboldt County, CA", "County"] <-"humboldt"
df1_pt[df1_pt$County == "Imperial County, CA", "County"] <-"imperial"
df1_pt[df1_pt$County == "Kern County, CA", "County"] <-"kern"
df1_pt[df1_pt$County == "Kings County, CA", "County"] <-"kings"
df1_pt[df1_pt$County == "Los Angeles County, CA", "County"] <-"los angeles"
df1_pt[df1_pt$County == "Madera County, CA", "County"] <-"madera"
df1_pt[df1_pt$County == "Marin County, CA", "County"] <-"marin"
df1_pt[df1_pt$County == "Contra Costa County, CA", "County"] <-"mariposa"
df1_pt[df1_pt$County == "Merced County, CA", "County"] <-"merced"
df1_pt[df1_pt$County == "Monterey County, CA", "County"] <-"monterey"
df1_pt[df1_pt$County == "Napa County, CA", "County"] <-"napa"
df1_pt[df1_pt$County == "Orange County, CA", "County"] <-"orange"
df1_pt[df1_pt$County == "Placer County, CA", "County"] <-"placer"
df1_pt[df1_pt$County == "Riverside County, CA", "County"] <-"riverside"
df1_pt[df1_pt$County == "Sacramento County, CA", "County"] <-"sacramento"
df1_pt[df1_pt$County == "San Bernardino County, CA", "County"] <-"san bernardino"
df1_pt[df1_pt$County == "San Diego County, CA", "County"] <-"san diego"
df1_pt[df1_pt$County == "San Francisco County, CA", "County"] <-"san francisco"
df1_pt[df1_pt$County == "San Joaquin County, CA", "County"] <-"san joaquin"
df1_pt[df1_pt$County == "San Luis Obispo County, CA", "County"] <-"san luis obispo"
df1_pt[df1_pt$County == "San Mateo County, CA", "County"] <-"san mateo"
df1_pt[df1_pt$County == "Santa Barbara County, CA", "County"] <-"santa barbara"
df1_pt[df1_pt$County == "Santa Clara County, CA", "County"] <-"santa clara"
df1_pt[df1_pt$County == "Santa Cruz County, CA", "County"] <-"santa cruz"
df1_pt[df1_pt$County == "Shasta County, CA", "County"] <-"shasta"
df1_pt[df1_pt$County == "Solano County, CA", "County"] <-"solano"
df1_pt[df1_pt$County == "Sonoma County, CA", "County"] <-"sonoma"
df1_pt[df1_pt$County == "Stanislaus County, CA", "County"] <-"stanislaus"
df1_pt[df1_pt$County == "Tulare County, CA", "County"] <-"tulare"
df1_pt[df1_pt$County == "Ventura County, CA", "County"] <-"ventura"
df1_pt[df1_pt$County == "Yolo County, CA", "County"] <-"yolo"
df1_pt <- df1_pt %>% mutate(County = str_to_title(County))
df1_pt <- df1_pt %>% filter(!is.na("total_birth")) %>% filter(!is.na(Events)) %>% mutate(rate = Events/total_birth * 10^2)

Sonia’s Data Joining of CDC WONDER Live Birth Data (Low Birth Weight and Pre-Term Birth) and Pesticide Data I then joined the CDC WONDER data (low birth weight and preterm birth) and Zainab’s wrangled pesticide data to come up with a joint data. I then generated bar graphs to visualize the trend across a span of 2007-2016 (Please see shiny app). We noticed that Fresno and Kern county were the two top counties that used the highest amounts of pesticide and found out that San Joaquin Valley is a region that’s agriculturally productive.

county_ranks16 <- read_delim("table1_county_rank_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   COUNTY = col_character(),
##   LBS_2015 = col_double(),
##   RANK_2015 = col_double(),
##   LBS_2016 = col_double(),
##   RANK_2016 = col_double()
## )
repro_lbs16 <- read_delim("table3_reproductive_lbs_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   CHEMICAL = col_character(),
##   LBS_2007 = col_double(),
##   LBS_2008 = col_double(),
##   LBS_2009 = col_double(),
##   LBS_2010 = col_double(),
##   LBS_2011 = col_double(),
##   LBS_2012 = col_double(),
##   LBS_2013 = col_double(),
##   LBS_2014 = col_double(),
##   LBS_2015 = col_double(),
##   LBS_2016 = col_double()
## )
repro_acre16 <- read_delim("table4_reproductive_acres_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   CHEMNAME = col_character(),
##   ACRES_2007 = col_double(),
##   ACRES_2008 = col_double(),
##   ACRES_2009 = col_double(),
##   ACRES_2010 = col_double(),
##   ACRES_2011 = col_double(),
##   ACRES_2012 = col_double(),
##   ACRES_2013 = col_double(),
##   ACRES_2014 = col_double(),
##   ACRES_2015 = col_double(),
##   ACRES_2016 = col_double()
## )
table1_2016 <- county_ranks16 %>% transmute(county = COUNTY, 
                                            lbs_2015 = LBS_2015, rank_2015 = RANK_2015, 
                                            lbs_2016 = LBS_2016, rank_2016 = RANK_2016)

all_dat <- list(read_csv("table1_2007.csv")[1:3],
                read_csv("table1_2008.csv")[1:3],
                read_csv("table1_2009.csv")[1:3],
                read_csv("table1_2010.csv")[1:3],
                read_csv("table1_2011.csv")[1:3],
                read_csv("table1_2012.csv")[1:3],
                read_csv("table1_2013.csv")[1:3],
                read_csv("table1_2014.csv")[1:3],
                read_csv("table1_2015.csv")[1:3])
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2006 = col_double(),
##   rank_2006 = col_double(),
##   lbs_2007 = col_double(),
##   rank_2007 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2007 = col_double(),
##   rank_2007 = col_double(),
##   lbs_2008 = col_double(),
##   rank_2008 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2008 = col_double(),
##   rank_2008 = col_double(),
##   lbs_2009 = col_double(),
##   rank_2009 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2009 = col_double(),
##   rank_2009 = col_double(),
##   lbs_2010 = col_double(),
##   rank_2010 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2010 = col_double(),
##   rank_2010 = col_double(),
##   lbs_2011 = col_double(),
##   rank_2011 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2011 = col_double(),
##   rank_2011 = col_double(),
##   lbs_2012 = col_double(),
##   rank_2012 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2012 = col_double(),
##   rank_2012 = col_double(),
##   lbs_2013 = col_double(),
##   rank_2013 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2013 = col_double(),
##   rank_2013 = col_double(),
##   lbs_2014 = col_double(),
##   rank_2014 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2014 = col_double(),
##   rank_2014 = col_double(),
##   lbs_2015 = col_double(),
##   rank_2015 = col_double()
## )
table1 <- Reduce(function(x, y) left_join(x, y, by = "county"), all_dat)

long_table1 <- table1 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
table1_ranks <- long_table1 %>% filter(str_starts(usage, "rank"))
table1_lbs <- long_table1 %>% filter(str_starts(usage, "lbs"))
table1_lbs$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs$usage))

long_table2 <- table1_2016 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
table1_ranks_1516 <- long_table2 %>% filter(str_starts(usage, "rank"))
table1_lbs_1516<- long_table2 %>% filter(str_starts(usage, "lbs"))


table1_lbs_1516$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs_1516$usage))
combined_pesticide_use <- table1_lbs %>% full_join(table1_lbs_1516) 
## Joining, by = c("county", "usage", "value")
combined_pesticide_use <- combined_pesticide_use %>% group_by(usage) 
combined_pesticide_use <- combined_pesticide_use %>% arrange(usage)

averagebw <-df2 %>% select("County", "Year", "rate")
pesticide_averagebw_join <- averagebw %>% inner_join(combined_pesticide_use, by = c("County" = "county", "Year" = "usage")) 

averagept <-df1_pt %>% select("County", "Year", "rate")
pesticide_averagept_join <- averagept %>% inner_join(combined_pesticide_use, by = c("County" = "county", "Year" = "usage")) 

#bar graph of low birth weight
pesticide_averagebw_join %>% ggplot(aes(County, rate)) + geom_col() + ylab("Low Birth Weight Rate (%)") +xlab("") +
                theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1/2)) 

#bar graph of pesticide 
pesticide_averagept_join %>% ggplot(aes(County, value)) + geom_col() + ylab("Pesticide Use (Pounds)") +xlab("") + 
            theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1/2))

Code for Creating the Map - Leaflet Map Using Preterm Birth Data I Wrangled Earlier (See Shiny App for the Final Result) This is a similar spatial map but for pesticide us. Following is the wrangling process of generating a map that shapes the map of California, and merging that spatial object with pesticide use that I wrangled earlier to generate a leaflet map.

averagept_df <- combined_pesticide_use %>% filter(usage == "2016")

map <- readOGR(path.expand("cb_2018_us_county_20m.shp"),
               layer = "cb_2018_us_county_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\zay-z\Documents\Harvard Chan\Fall 2020\BST260\datascience-project\Data Prep (& Final RMD)\cb_2018_us_county_20m.shp", layer: "cb_2018_us_county_20m"
## with 3220 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
Statekey<-read.csv('./STATEFPtoSTATENAME_Key.csv', colClasses=c('character'))
map<-merge(x=map, y=Statekey, by="STATEFP", all=TRUE)
SingleState <- subset(map, map$STATENAME %in% c(
    "California"
))

spatial_pesticide <-sp::merge(x=SingleState, y=averagept_df, by.x="NAME", by.y="county", by=x)

binpes <- c(200, 100145, 1131454, 3345277, Inf)
pal3 <- colorBin(
    palette = "magma",
    domain = spatial_pesticide$value, n=7, bins=binpes)

leaflet(spatial_pesticide, options = leafletOptions(zoomControl = TRUE, zoomLevelFixed = FALSE, dragging=TRUE, minZoom = 5.3, maxZoom = 9)) %>% 
                setView(lat = 36.778259, lng = -119.417931, zoom = 6) %>%
                addPolygons(color = "Black", weight = 1, smoothFactor = 0.5, 
                            opacity = 1.0, fillOpacity = 0.5, layerId = ~NAME,
                            fillColor = ~pal3(value), 
                            popup = ~as.factor(paste0("<b><font size=\"4\"><center>County: </b>",spatial_pesticide$NAME,"</font></center>","Amounts of Pesticides used </b>", sprintf("%1.2f", spatial_pesticide$value),"<br/>"))) %>%
                addLegend(pal = pal3, values = spatial_pesticide$value, opacity = 1, title="Amounts of Pesticide Used (Pounds)")

Regression Analysis

#Top 10 Counties in term of pesticide usage
agro <- c("Kern", "Tulare", "Fresno", "Monterey", "Merced", "Stanislaus", 
          "San Joaquin", "Ventura", "Madera", "Kings")

mch_regression <- MCH.CDC.Data_Race %>% 
  filter(Year == 2016) %>%
  mutate(agricultural = ifelse(County %in% agro, 1, 0))
linmod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, mch_regression)
summary(linmod)[4]
## $coefficients
##                               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)                 -5527.0554  870.44429 -6.349695 3.049477e-09
## Average.LMP.Gestational.Age   227.3201   22.48768 10.108650 3.118120e-18
summary(linmod)[9]
## $adj.r.squared
## [1] 0.4266074
mch_regression %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + 
  geom_line(aes(y = predict(linmod)))  + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcomes by Pesticide Usage") + 
  ylim(2980, 3520) +
  xlim(37.6, 39.6)

Pesticide use did not appear to affect much, but race did.

avg_bw_mod2016 <- lm(Average.Birth.Weight ~ factor(Mothers.Race, ordered = F) + Average.LMP.Gestational.Age, mch_regression)
summary(avg_bw_mod2016)[4]
## $coefficients
##                                                                Estimate
## (Intercept)                                                -3547.882869
## factor(Mothers.Race, ordered = F)Asian or Pacific Islander  -109.533821
## factor(Mothers.Race, ordered = F)Black or African American  -111.909013
## factor(Mothers.Race, ordered = F)White                        -7.310527
## Average.LMP.Gestational.Age                                  177.697257
##                                                            Std. Error
## (Intercept)                                                 642.55388
## factor(Mothers.Race, ordered = F)Asian or Pacific Islander   12.34741
## factor(Mothers.Race, ordered = F)Black or African American   12.46519
## factor(Mothers.Race, ordered = F)White                       12.51234
## Average.LMP.Gestational.Age                                  16.59650
##                                                               t value
## (Intercept)                                                -5.5215337
## factor(Mothers.Race, ordered = F)Asian or Pacific Islander -8.8709968
## factor(Mothers.Race, ordered = F)Black or African American -8.9777224
## factor(Mothers.Race, ordered = F)White                     -0.5842655
## Average.LMP.Gestational.Age                                10.7069102
##                                                                Pr(>|t|)
## (Intercept)                                                1.717557e-07
## factor(Mothers.Race, ordered = F)Asian or Pacific Islander 4.427547e-15
## factor(Mothers.Race, ordered = F)Black or African American 2.426946e-15
## factor(Mothers.Race, ordered = F)White                     5.600389e-01
## Average.LMP.Gestational.Age                                1.217500e-19
summary(avg_bw_mod2016)[9]
## $adj.r.squared
## [1] 0.7199346
#parallel lines, Black and Asian/Pacific Island populations fare the worst
mch_regression %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = Mothers.Race)) + 
  geom_point() + geom_line(aes(y = predict(avg_bw_mod2016))) + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Mother's Race") +
  ggtitle("Birth Weight Outcomes by Race") + 
  ylim(2980, 3520) +
  xlim(37.6, 39.6)

Pesticide use did not appear to affect much, but race did. So, we stratified by race.

#simple linear model more parsimonious than the one that has the interaction term for American Indian/Alaska Native Mothers
#tho there are low populations for this group, so I don't really trust any of the models
amerindian_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "American Indian or Alaska Native"))
summary(amerindian_mod)[4] #coefficients
## $coefficients
##                               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)                 -3829.7135 1246.73649 -3.071791 4.495686e-03
## Average.LMP.Gestational.Age   184.9774   32.20391  5.743941 2.857815e-06
summary(amerindian_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.5078807
q1 <- mch_regression %>%
  filter(Mothers.Race == "American Indian or Alaska Native") %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + geom_line(aes(y = predict(amerindian_mod)), size = 1) + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcome for American Indian and Alaska Native Mothers") + 
 ylim(2980, 3520) +
  xlim(37.6, 39.6)
#even slr doesn't explain a lot of the errors for Asian mothers
asian_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "Asian or Pacific Islander"))
summary(asian_mod)[4] #coefficients
## $coefficients
##                               Estimate Std. Error    t value   Pr(>|t|)
## (Intercept)                 -429.60128 1658.17961 -0.2590801 0.79718280
## Average.LMP.Gestational.Age   94.23017   42.87789  2.1976402 0.03509986
summary(asian_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.1012334
q2 <- mch_regression %>%
  filter(Mothers.Race == "Asian or Pacific Islander") %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + 
  geom_line(aes(y = predict(asian_mod)), size = 1) + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcome for Asian and Pacific Islander Mothers") + 
  ylim(2980, 3520) +
  xlim(37.6, 39.6)
#simple linear model best for Black mothers
black_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "Black or African American"))
summary(black_mod)[4] #coefficients
## $coefficients
##                               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)                 -4187.3941 1234.47274 -3.392051 1.816634e-03
## Average.LMP.Gestational.Age   191.3651   31.97847  5.984186 1.010634e-06
summary(black_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.5058892
q3 <- mch_regression %>%
  filter(Mothers.Race == "Black or African American") %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + geom_line(aes(y = predict(black_mod)), size = 1) + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcome for Black Mothers") +
  ylim(2980, 3520) +
  xlim(37.6, 39.6)
# Simple Linear Regression best shows relationship for White mothers
white_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "White"))
summary(white_mod)[4] #coefficients
## $coefficients
##                               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)                 -4461.0894 1011.08034 -4.412201 1.030204e-04
## Average.LMP.Gestational.Age   201.0204   26.03101  7.722343 6.794002e-09
summary(white_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.6329664
q4 <- mch_regression %>%
  filter(Mothers.Race == "White" ) %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + geom_line(aes(y = predict(white_mod)), size = 1) + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcome for White Mothers") +
  ylim(2980, 3520) +
  xlim(37.6, 39.6)
q1

q2

q3

q4

#removed Imperial, the linear prediction improved a lot, and Imperial has a small population in general
asian_mod_no7 <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "Asian or Pacific Islander" & County != "Imperial"))
summary(asian_mod_no7)[4]
## $coefficients
##                              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)                 -4485.823 1854.93330 -2.418320 0.0214593571
## Average.LMP.Gestational.Age   199.252   47.99022  4.151929 0.0002281163
summary(asian_mod_no7)[9]
## $adj.r.squared
## [1] 0.329793
mch_regression %>%
  filter(Mothers.Race == "Asian or Pacific Islander") %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + 
  geom_line(aes(y = predict(asian_mod)), size = 1) + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcome for Asian and Pacific Islander Mothers (no Imperial)") +
  ylim(2980, 3520) +
  xlim(37.6, 39.6)

Checking Assumptions for Linear Models

plot(linmod)

hist(rstudent(linmod), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)

plot(avg_bw_mod2016)

hist(rstudent(avg_bw_mod2016), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)

plot(asian_mod)

hist(rstudent(asian_mod), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)

plot(asian_mod_no7)

hist(rstudent(asian_mod_no7), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)

plot(amerindian_mod)

hist(rstudent(amerindian_mod), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)

plot(black_mod)

hist(rstudent(black_mod), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)

plot(white_mod)

hist(rstudent(white_mod), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)

Adding the top ranked pesticide counties do not improve any of the models